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Abstract 

Human genome-wide association studies (GWAS) of longevity attempt to identify alleles at different frequencies in the extremely 
old, relative to a younger control sample. Here, we apply a GWAS approach to "synthetic" populations of Drosophila melanogaster 
derived from a small number of inbred founders. We used next-generation DNA sequencing to estimate allele and haplotype 
frequencies in the oldest surviving individuals of an age cohort and compared these frequencies with those of randomly sampled 
individuals from the same cohort. We used this case-control strategy in four independent cohorts and identified eight significantly 
differentiated regions of the genome potentially harboring genes with relevance for longevity. By modeling the effects of local 
haplotypes, we have more power to detect regions enriched for longevity genes than marker-based GWAS. Most significant regions 
occur near chromosome ends or centromeres where recombination is infrequent, consistent with these regions harboring uncon- 
ditionally deleterious alleles impacting longevity. Genes in regions of normal recombination are enriched for those relevant to immune 
function and a gene family involved in oxidative stress response. Genetic differentiation between our experimental cohorts is com- 
parable to that between human populations, suggesting in turn that our results may help explain heterogeneous signals in human 
association studies of extreme longevity when panels have diverse ancestry. 
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Introduction 

Average human life expectancy in developed nations ranges 
from about 80 to 85 years, and human twin studies suggest 
that only 20-30% of the variation in survival within this range 
is determined by genetic variation (Herskind et al. 1996). 
Extreme longevity, often defined as surviving beyond 100 
years, may be a distinct subphenotype with a larger genetic 
component (Perls and Terry 2003). Survival to extreme ages is 
rare but clusters in families (Perls et al. 2000), and relatives of 
centenarians have a marked delay in age-related diseases 
(Perls et al. 2007). In light of these observations, candidate 
gene studies (Ewbank 2007; Flachsbart et al. 2009) and 
genome-wide scans (Puca et al. 2001; Newman et al. 2010; 
Sebastiani et al. 2012) have been carried out on panels of 
extremely long-lived individuals. Association studies of survival 
to very old age tend to be underpowered, because sample 
sizes are small, typically fewer than 500 individuals, and also 
because panels of long-lived individuals tend to be made up 



of both nonagenarians and centenarians (Tan et al. 2008). 
These studies do not usually implicate the same loci in 
panels with different ancestral backgrounds, with two excep- 
tions: 1) APOE, a known risk factor for cardiovascular and 
Alzheimer's disease (Schachter et al. 1994; Ewbank 2007; 
Sebastiani et al. 2012); and 2) FOX03a, which encodes a reg- 
ulator of the insulin-IGF1 signaling pathway (Anselmi et al. 
2009; Flachsbart et al. 2009; Pawlikowska et al. 2009). 

According to recent US census data, about 0.9% of males 
and 2.8% females survive to age 100 (U.S. Social Security 
Administration, 2009 Census [http://www.ssa.gov/oact/ 
STATS/table4c6.html, last accessed November 30, 2013]). 
Here, we conduct a genome-wide case-control study in 
Drosophila melanogaster that compares the oldest 2% of 
female flies with a similar number of adult females from the 
same cohort sampled at a younger age. This experiment 
mimics human centenarian studies, providing a well-defined 
model for evaluating the prospects of the study of aging using 
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Fig. 1. — Overview of sampled flies, (a) Schematic depicting the establishment of synthetic recombinant populations used in this study (cf. King, 
Macdonald et al. 2012; King, Merkes et al. 2012). A1, A2, B1, and B2 were sampled for this work after approximately 100 generations of random 
mating, (b) Mortality data from females in the four "census" cages. Dead flies were sexed, counted, and removed from each population cage every other 
day. When only -10 surviving females remained, these were collected alive and used for DNA extraction (thus the survivorship curve does not reach zero at 
the end of the assay). 



relative enrichment or deficit of alleles in extremely old co- 
horts. We used "synthetic" recombinant populations of 
Drosophila for which extensive haplotype information is avail- 
able (King, Macdonald et al. 2012; King, Merkes et al. 2012) 
(fig. 1a) to establish large cohorts (fig. / \b) from which we 
collected pools of young control adult females and extremely 
old female flies. We replicated the experiment four times, with 
replicates initiated from two pairs of synthetic populations, 
each pair derived from an independent set of founders (the 
"A" and "B" populations), with the individual populations 
within a pair initiated from the same founders but maintained 



apart for -100 generations (A1 vs. A2 or B1 vs. B2). We then 
used next-generation deep resequencing of these control and 
old panels to compare genetic differentiation between them. 



Materials and Methods 

Experimental Populations 

We used the synthetic recombinant populations of 
the Drosophila Synthetic Population Resource (DSPR) 
(King, Macdonald et al. 2012; King, Merkes et al. 2012) as 
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four aging populations from which to gather genomic data. 
Two independent sets of seven inbred Drosophila lines (the 
"founders") were crossed to initiate two synthetic recombi- 
nant populations A and B; and an additional inbred line was 
used in the founding of both (fig. 1a). Both A and B have been 
then maintained as two independent large populations 
(A1/A2, B1/B2) and, for this experiment, were sampled after 
approximately 100 generations. The founder strains were 
sampled from various geographic locations and are represen- 
tative of genetic variation present in natural Drosophila pop- 
ulations worldwide. All genetic material for each synthetic 
recombinant population is thus derived from just eight foun- 
der haplotypes, and each of the 15 inbred founder strains 
have been resequenced to ~50x coverage. After 100 gener- 
ations of maintenance, the genome of any given recombinant 
individual used to initiate the current experiment is a mosaic of 
the founder chromosomes. As part of the DSPR project, given 
medium-density genome-wide single-nucleotide polymor- 
phism (SNP) genotypes for a collection of recombinant 
inbred lines (RILs) (or individuals) obtained from the synthetic 
populations, hidden Markov models have been developed 
which can determine at any given chromosomal position 
which of the eight founder haplotypes define that segment 
of the genome. By virtue of the founder haplotypes being 
completely resequenced, we thus have the ability to accurately 
infer the complete genome sequence of that line/individual; 
such in silico genome sequences are available for -1,700 
inbred lines derived from generation 50 of the synthetic 
population. 

Longevity Assay 

The four synthetic recombinant populations, which normally 
are kept in large milk bottles (1 .89 I), were expanded and 
standardized using 8-dram culture vials for three generations 
prior to the assay. During the standardization generations, 
flies spent the first 14 days of life developing in vials with 
corn meal-dextrose medium. On day 14, adult flies were 
moved into plexiglass cages and fed media supplemented 
with live yeast paste to stimulate oviposition. Eggs were col- 
lected within 1 2-h oviposition windows to ensure that individ- 
uals in the subsequent generation were as close to the same 
age as possible. During the cohort assay, 120 14-day-old 
females (from egg) were collected from each cohort and 
their DNA extracted in bulk for later genomic library prepara- 
tion ("control" libraries). Approximately 12,000 14-day-old 
individuals were then transferred from vials and equally di- 
vided into 12 plexiglass cages in which to age. Previous 
work has shown that at densities of 1 ,000 flies/cage, fly den- 
sity itself has negligible impacts on mortality (Shahrestani P, 
unpublished data). Flies were fed fresh media every other day, 
and before feeding, dead flies were removed and counted. 
For 1 of the 12 cages per population, flies were removed, 
counted, and sexed every day, for the purpose of generating 



detailed population survivorship curves (fig. 1b). For the 
remaining 11 of the 12 cages per population, flies were re- 
moved and counted every other day; this was to ensure that 
excess dead flies did not accumulate in each cage and also to 
verify the total number of flies allocated to each cage. The last 
surviving -2% of females in each cage were retained (supple- 
mentary table S1 , Supplementary Material online) for genomic 
DNA (gDNA) library preparation ("old" libraries). We chose to 
ignore the oldest living male flies for the purposes of this 
experiment, so downstream results should be interpreted as 
female-specific. 

Genome Sequencing and SNP Identification 

gDNA was extracted from both control and old female pools 
from each population using the Qiagen/Gentra Puregene kit, 
following the manufacturer's protocol for bulk DNA purifica- 
tion. The resulting eight gDNA pools were prepared as stan- 
dard paired-end adapter libraries and each run as single PE75 
lanes on an lllumina HiSEQ 2000. Raw reads were aligned to 
the reference genome sequence of D. melanogaster using 
bwa (Li and Durbin 2010). We then used samtools mpileup 
and the open-source code PoPoolation (version 2) to generate 
an allele count for each population and site (Kofler et al. 
2011). All subsequent SNP-level analysis using these data 
tables was carried out in R (www.r-project.org, last accessed 
November 27, 2013). 

To identify informative SNPs, we first generated a list of 
positions at which we expect bi-allelic SNPs to be segregating 
among the eight founder haplotypes contributing to a given 
population; all other sites were ignored. We further excluded 
sites in the observed data where there was some evidence of 
the third allele (frequency of a third allele >0.05 given cover- 
age >10), where coverage was very high (>2,000 in control 
and old samples combined in A or B), or where we obtained 
zero coverage in either the control or old sample. Ultimately, 
we ended up with -1.2 M SNPs in the A populations and 
-1.1 M SNPs in the B populations. We note that as we 
know all the SNPs potentially segregating in the populations 
by virtue of their founders being known and sequenced, SNP 
ascertainment is straightforward. 

Quantifying Differences 
SNP-Level Analysis 

To quantify SNP frequency differences in the data, we first 
calculated the absolute differences in minor allele frequencies 
at all of our identified SNPs. We define the minor allele as the 
least common allele across our eight founder haplotypes, so 
the minor allele is not necessarily the least common allele per 
position in the observed data. To take linkage into ac- 
count, we calculated sliding-window averages of the abso- 
lute differences (SWAD) with a window size of 200 
consecutive SNPs and a step size of 50 SNPs. As average 
coverage per population is high (supplementary fig. S1, 
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Supplementary Material online), we did not weigh the allele 
frequency difference by coverage. 

Haplotype-Level Analysis 

We used our SNP data and the consensus sequences of the 
founders to estimate the most likely set of founder haplotype 
frequencies in the control and old populations across the 
genome. We estimated these frequencies using a 201 SNP 
sliding window with a step of 50 SNPs. At a given genomic 
position, we used the set of 100 SNPs on either side of that 
position to determine the most likely set of founder haplotype 
frequencies that would produce the observed set of SNP fre- 
quencies. We considered the set of founder haplotype fre- 
quencies that minimized the following quantity to be the 
most likely set of founder haplotype frequencies at the focal 
position: 

where Q is the sequence coverage at the /'th SNP in the 
window, MAF/ is the minor allele frequency at the /th SNP, 
fij is the allelic state (1 = minor allele, 0 = major allele) of the 
y'th founder at the /th SNP, and h,j is the haplotype frequency 
of the /th founder at the /th SNP. The set of eight haplotype 
frequencies (h) for the window are the quantities being opti- 
mized. Optimization was achieved using the optim function in 
R. The second part of the above equation serves to constrain 
the set of haplotype frequencies such that the sum of the 
eight founder haplotype frequencies cannot exceed 1 and it 
reduces to near zero when the founder haplotype frequencies 
sum to 1 . Individual haplotype frequencies were bounded by 0 
and 0.95. Individual haplotype frequencies were bounded by 
0.95 rather than 1 because bounding by 1 occasionally led 
to local convergence with a single haplotype frequency equal- 
ing 1 . The term preventing the sum of the haplotype frequen- 
cies exceeding 1 severely limits the search space when 
one haplotype frequency is equal to 1, leading to these local 
convergences. This phenomenon is completely prevented by 
bounding the individual haplotype frequencies at 0.95. 

We found that the haplotype estimator was not able to 
accurately estimate the haplotype frequencies if two or more 
founder haplotypes are highly correlated (i.e., cannot be dis- 
tinguished from one another over the 201 SNP window). In 
this case, the haplotype estimator can produce large isolated 
differences in D between control and old pools, as for the two 
indistinguishable haplotypes only their sum is constrained. To 
prevent these spurious differences from arising, when two or 
more haplotypes were highly correlated with one another 
(>0.9), we instead estimated a single combined frequency 



of the correlated haplotypes using the average allelic states 
of the correlated haplotypes in our haplotype estimator as f,- 
for that set of haplotypes At these positions, our haplotype 
estimator produces fewer than eight founder frequencies 
depending on the number of correlated haplotypes. 

To identify positions with overall divergent haplotype fre- 
quencies in the control versus old pools, we calculated the 
squared difference in haplotype frequencies between them. 
This measure of Euclidean distance between two vectors is a 
general approach but widely used in distance-based inference 
in biology; for example, to analyze gene expression data from 
microarrays (Shannon et al. 2003). Our test statistic, hereafter 
D, is essentially the average percent distance between haplo- 
types at a given position: 

ITU 1 (h 0 i - h Y if 

100 -y n — (2) 

where h Q j is the haplotype frequency of the y'th founder in the 
old population, h Y j is the haplotype frequency of the y'th foun- 
der in the younger control pool, and n is the number of hap- 
lotypes estimated at that position. Typically n is 8, but as noted 
earlier, when haplotypes are correlated, they are combined 
and fewer than eight frequencies are estimated. We then 
used the loess smoothing function in R with a span of 0.01 
to smooth across genetic distance. This process tempers any 
highly localized fluctuations, which are not expected based on 
our expected size distribution of nonrecombined haplotypes 
at generation 100. Autosomal segment sizes are expected to 
follow an exponential distribution with an expected median 
size of 1.4 cM at generation 100, although segment sizes 
at generation 50 were slightly larger than the theoretical 
expectation. 

Determining Statistical Significance 

A large set of RILs generated from the synthetic populations 
allowed us to generate a null distribution of both D and the 
SWAD for the purpose of determining statistical significance. 
Briefly, -500 RILs were created from each synthetic subpop- 
ulation at generation 50 via 25 generations of full-sib mating 
(for details on the creation of these RILs, see King, Merkes 
et al. 2012). The complete underlying founder haplotype 
structure of these RILs is known. King, Macdonald et al. 
(2012) describe the implementation of a hidden Markov 
model incorporating dense genotype data for the RILs and 
the founder genome sequences to infer the founder haplo- 
type at each position in each RIL. Because the haplotype struc- 
ture is known, we can infer the genotype of each RIL at each 
SNP by using the haplotype assignments and the founder 
consensus sequences. For a given RIL and position, the 
hidden Markov model results in a probability assignment for 
each founder haplotype indicating the likelihood of the ge- 
netic material at that position is derived from that founder 
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haplotype. The probability a given RIL harbors the minor allele 
(m z ) at a given position is 

m z = J2 f j' Pj ( 3 ) 

where fj is the allelic state (1 = minor allele, 0 = major allele) of 
the yth founder and p y is the probability the ancestry of the RIL 
is the yth founder at that position. The estimated minor allele 
frequency in a given population of RILs at a given position is 
the mean of m z across RILs. Therefore, for any given set of 
RILs, we can estimate the SNP frequency at every position and 
then use these SNP frequencies to calculate SWAD or D as 
described earlier. 

To generate null distributions of SWAD and D, we per- 
formed 5,000 iterations of a Monte Carlo simulation by sam- 
pling with replacement two sets of 200 RILs (corresponding 
to the young and old pools, with 200 haploid genomes the 
minimum number of copies in each sample, cf . supplementary 
table S1, Supplementary Material online) for each subpopula- 
tion. Then, we calculated the SNP frequencies in each set and 
ran the haplotype estimator on both sets and calculated our 
SWAD and D statistics. Generally, the idea is to create null 
distributions of SWAD and D that result from random sam- 
pling of two sets of 100 individuals alone (i.e., no true differ- 
ence in frequency). Supplementary figure S2, Supplementary 
Material online, provides a visual overview of this strategy. The 
vectors of coverages observed in the real data were also used 
to estimate frequencies in the simulated data for consistency; 
in other words, we drew Q (the observed coverage at the /th 
SNP, in either the old or young pool) alleles with replacement 
from the corresponding old or young simulated pool in each 
Monte Carlo iteration. Thus, we model stochasticity in the 
random draws of genome-wide haplotypes from each syn- 
thetic population, as well as the variation in sequence cover- 
age commonly observed in Next-Gen data sets. 

Although this strategy of using Monte Carlo-based sam- 
pling of in silico sequenced RILs may not appear intuitive, we 
find it the most appropriate method generating appropriate 
null sampling distributions. For example, tests based on per- 
muting reads between old and young pools are overly liberal 
when individuals are not barcoded. A test designed to detect 
differences between two finite pools drawn from a large pop- 
ulation must detect differences in haplotype (or SNP) frequen- 
cies above and beyond that generated purely by the sampling 
process, and permuting reads between samples loses this 
sampling information. In addition, permuting reads between 
pools can potentially destroy longer range information about 
linkage disequilibrium, and this information is important in a 
study such as this one where the long-range haplotype infor- 
mation can be important. By contrast, the Monte Carlo 
RIL-based approach allows the sampling of entire haplotypes 
(i.e., entire RILs) and thus controls for the finite sample of 
individuals used to make the pools and retains important as- 
sociations that should be factored into the generation of a null 



distribution of a test statistic. Had pooled gDNA libraries had 
been constructed of barcoded individuals whose genomes 
could be parsed out, the sequences of these entire individuals 
could be permuted between control and old pools. This would 
likely be the most appropriate strategy for determining signif- 
icance, although it presents a considerable practical challenge, 
specifically the costs involved in making the -1,000 libraries 
implied by the ~ 1,000 individuals examined in this study. As 
the technology for highly multiplexed Next-Gen sequencing 
projects continues to improve, this could become a viable 
approach in the near future. 

SNP-Level Analysis (SWAD) 

The mean SWAD of the Monte Carlo iterations varied sub- 
stantially across the genome, suggesting that the use of a 
genome-wide significance threshold would be overly conser- 
vative for much of the genome. Instead, we used the 5,000 
SWAD values generated at each position to calculate position- 
specific P values for each subpopulation as follows: 



where k-, is the number of iterations exceeding the observed 
SWAD value at position / and N is the total number of itera- 
tions (5,000). Computation time limited the number of itera- 
tions we were able to perform and the number of iterations 
places a limit on our P values. The lowest P value we can 
obtain is when k,- equals zero, producing a P value of 
0.0002. This P value is greater than what would be obtained 
as a 5% significance threshold after correcting for multiple 
tests, preventing us from obtaining individually genome-wide 
significant positions. Because we could not use individual 
P values to determine significance, we identified regions en- 
riched for low P values (<0.005). To do this, we broke the 
genome into 0.5-cM intervals and determined the proportion 
of P values below 0.005 within each interval (fig. 2). 

Haplotype-Level Analysis 

We were able to determine genome-wide significance thresh- 
olds for the haplotype level analysis. First, we smoothed each 
Monte Carlo iteration in the same way as the observed data, 
using the loess smoothing function in R with a span of 0.01, 
smoothing across genetic distance. The resulting mean of the 
Monte Carlo iterations is quite stable across the genome, as is 
the per position 0.995 quantile. We then used the peak finder 
function msPeakSimple from the msProcess library in R with a 
span of 50 and a signal-to-noise threshold of 2 to identify 
distinct peaks across the genome. For a wide range of 
D values, we quantified the number of distinct peaks per 
genome scan for each Monte Carlo iteration that exceeded 
that D. We could then calculate the number of observed 
peaks exceeding a given D threshold per Monte Carlo 
genome scan. The D that corresponds to 0.05 peaks per 
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Fig. 2. — Empirical P values from the SNP-level analysis. Black points are the observed empirical - log 10 (P values) plotted with some transparency such 
that multiple points plotted in the same location appear darker. Pink points are the empirical - log 10 (P values) obtained from a single Monte Carlo iteration to 
show the variation obtained by chance alone. The blue line is the proportion of P values below 0.005 in 0.5-cM intervals across the genome. Green lines at 
the base of the plot denote the region of interest for any peaks exceeding the 50% threshold, as determined by the haplotype analysis (for comparison with 
fig. 3). Light green lines denote peaks exceeding the 50% threshold, whereas dark green lines denote peaks exceeding the 5% threshold. The thick portion 
of the line denotes a 2-cM interval and the thin portion denotes a 5-cM interval. 



genome scan is our threshold corresponding to a 5% false- 
positive rate (FPR). We also present a more liberal 50% thresh- 
old (i.e., one peak every other genome scan). To localize a 
region of interest for each identified peak, we used a standard 
genetic distance of 1 cM on either side of the highest D value 
(interval spanning 2.5 cM on either side also shown; supple- 
mentary figs. S3a-h and SA-a-h, Supplementary Material 
online). 

Results and Discussion 

Longevity Assay 

The synthetic recombinant populations (fig. 1) were synchro- 
nized and used to establish cohorts of 12,000 individuals per 
mortality. Approximately 98% of the females in each cohort 
died within 62-66 days posteclosion (fig. ^b and supplemen- 
tary table S1, Supplementary Material online), and the remain- 
ing 2% of surviving females were sacrificed alive and used to 
create four pooled old gDNA libraries. These pools were 
sequenced alongside four control gDNA libraries consisting 
of the same number of females sacrificed young, within 
24 h of eclosion. We obtained an average coverage of approx- 
imately 80 x at the positions we considered SNPs per each of 



the four populations and two time-points examined (supple- 
mentary fig. S1 , Supplementary Material online). The synthetic 
populations were founded from a total of 1 5 highly inbred 
and completely resequenced founder strains; the SNPs of this 
study thus correspond to those SNPs known to be segregating 
in the populations based on their being identified in the foun- 
ders. Although many SNPs are shared between populations, 
alleles private to the A or B populations are common as the 
A and B populations are derived from different founders. 

As the choice to retain the oldest 2% of surviving females 
was based on human population data, and not informed by 
Drosophila biology, a cursory analysis of aging trajectories in 
the assay populations is merited. Evolutionary theory predicts 
that mortality should increase in an approximately exponen- 
tial fashion through the first part of adult life and then pla- 
teau at some advanced age (Mueller and Rose 1996). This 
plateau is called the "late-life" phase of adulthood, and it is a 
feature of all aging populations. The age at which this pla- 
teau starts is considered the transition between aging and 
late-life and is termed the "breakday" (Mueller and Rauser 
201 1). In human populations in developed nations, estimates 
of breakday range from 90 to 110, suggesting that cente- 
narian individuals in genome-wide association studies 
(GWAS) panels are near the transition from aging to late-life 
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Fig. 3. — D across the genome. Different chromosome arms are denoted by different background shades (white/gray). The solid black line is the D for the 
observed data, the dotted blue line is the 99.5% per position quantile, and the dashed black lines are the genome-wide threshold for an alpha of 0.05 and 
0.5 (noted as 5% and 50% on the right axis). The pink line is the D for a single Monte Carlo iteration to show the variation obtained by chance alone. Dark 
green lines at the base of the plot denote the region of interest for any peaks exceeding the 5% threshold, and light green lines denote the region of interest 
for any peaks exceeding the 50% threshold only. The thick portion of the line denotes a 2-cM interval and the thin portion denotes a 5-cM interval. 



(Greenwood and Irwin 1939; Mueller et al. 2011). We esti- 
mate that all four of our populations were in late-life at the 
end of the mortality assay when individuals were sampled for 
sequencing and thus old enough to merit their use in an 
association study of extreme old age. We fit our mortality 
data to the evolutionary model of late-life using maximum- 
likelihood estimates (Shahrestani et al. 201 2). We found that 
for all four of our populations, the estimated breakday 
occurred just before the time of "centenarian" sampling 
(supplementary fig. S5, Supplementary Material online). 

Genomic Regions of Differentiation 

We identify 10-30 genomic regions, per population, of local- 
ized high SNP frequency differentiation between the control 
and old pools (fig. 2). Patterns of allele frequency differentia- 
tion in each population are evidently unique, although we do 
observe generally high SNP frequency differentiation near cen- 
tromeres and at the ends of chromosomes. We used observed 
SNP frequencies in the pools to estimate the frequencies of the 
each founder haplotype at different locations across the 
genome (fig. 3). We developed a statistic, D (the Euclidean 
distance between the two haplotype frequency vectors), that 
summarizes total haplotypic divergence between control and 
old flies within each population. At a genome-wide alpha of 
5% (D> 7.9), we find one significant peak in A1, six in A2, 



zero in B1, and one in B2 (table 1). At a more liberal genome- 
wide alpha of 50% (D> 6.6), we find two additional peaks 
per population (supplementary table S2, Supplementary 
Material online). Peaks in D are local and isolated, with 
elevated D values spanning <2 cM from each peak's center 
(table 1 and supplementary table S2, Supplementary Material 
online). Alternatively, analyzing SNP frequency differences 
alone leads to narrower peaks roughly half this size. 
Although this is potentially valuable in terms of localizing 
putatively causative variants, we frequently observe large 
SNP differentiation in regions without apparent high values 
of D (most notably in population B1). We regard these peaks 
as likely false positives, consistent with comparisons of SNP 
versus haplotype divergence in the context of "collaborative 
cross" genetic mapping studies (Valdar et al. 2005; Aylor et al. 
201 1). As haplotype peaks should therefore be more reliable 
than the peaks generated by individual SNP frequencies, we 
chose to only identify genes/functional groups under the more 
inclusive haplotype peaks. 

We find a total of 1,654 genes occurring under the eight 
peaks significant at a 5% FPR (table 1 ). Five of these haplotype 
peaks occur in regions of suppressed recombination, near cen- 
tromeres or chromosome ends (cf. Fiston-Lavier et al. 2010, 
fig. 1). Population genetics theory predicts that genomic re- 
gions of suppressed recombination should harbor a greater 
number of unconditionally deleterious alleles at higher 
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Table 1 

Summary of "Peaks" Exceeding Our 5% FPR or Genomic Regions with Values of D>7.9 



Pop 


Chr 


Peak Position 


Peak Position 


cM/MB 


2 cM Centered 


D 


PropP at 


No. of Genes 


Supp. Figure 






(Physical) 


(cum. cM) 




on Peak Position 




Nearest 0.5 cM 


under Peak 




A2 


2R 


2R:1 1,845,275 


142.997 


4.24 


11,606,503-12,082,950 


8.99 


0.543 


53 


3a and 6a 


A2 


2R 


2R:14,325,000 


151.864 


3.01 


13,980,492-14,681,997 


9.49 


0.477 


90 


3b and 6b 


A2 


2L 


2L:2,053,196 


70.916 


2.95 


1,695,797-2,384,371 


8.55 


0.255 


98 


3c and 6c 


A1 


2R 


2R:20,757,872 


173.280 


2.00 


20,297,228-21,100,988 


10.71 


0.000 


100 


3d and 6c/ 


A2 


2R 


2R:6,01 6,304 


127.894 


1.94 


5,501,046-6,534,300 


9.29 


0.898 


110 


3e and 6e 


A2 


2L-2R 


2R:228,561 


121.116 


1.84 


19,958,857-2,789,898 a 


14.61 


0.648 


723 


3f and 6f 


A2 


2L 


2L: 15,038,696 


116.880 


1.17 


14,257,961-16,026,513 


9.12 


0.406 


137 


3g and 6g 


B2 


3L 


3L: 16,750,434 


218.032 


0.79 


15,623,323-18,194,769 


7.92 


0.190 


343 


3b and 6b 



Note. — The peak locations in this table correspond to supplementary figs. S3a-h and S6a-h, Supplementary Material online (in order listed here). Shaded peaks are in 
regions of normal recombination, and nonshaded peaks are in regions of reduced combinations near centromeres or chromosome ends. 
a Peak spans centromere. 



equilibrium frequencies, because purifying selection operates 
less effectively in such regions due to Muller's ratchet (Muller 
1 964). Thus, our observation of elevated D values occurring in 
low-recombination regions of the genome is perhaps intuitive. 
This observation is also notable in the context of the mutation 
accumulation evolutionary theory of ageing, which posits that 
unconditionally deleterious alleles can accumulate if their 
fitness effects are confined to postreproductive periods of 
life when selection against them is weak (Medawar 1952; 
Charlesworth 1994). Investigators have garnered substantial 
empirical evidence in support of mutation accumulation, in 
Drosophila as well as other taxa (Hughes 2010). Genomic 
regions exhibiting high D, but low recombination rates, are 
therefore good candidates for harboring variants impli- 
cated under the mutation accumulation theory of ageing. 
Unfortunately, as these regions also implicate a huge 
number of genes, they are not amenable to singling out 
individual candidates. For this reason, we restrict further 
discussion to the three peaks located in regions of relatively 
high rates of recombination, >2 cM/MB (table 1, shaded 
region). 

Notably, half of our observed significant regions occur in a 
single population (A2), although it is difficult to imagine a 
biological reason for this observation. We examined the raw 
genotype data from A2 for evidence of contamination from 
feral flies that accidentally became incorporated to the 
A2 sample during the assay preparation or DNA extraction. 
At least two lines of evidence suggest that there is no such 
contamination. All synthetic populations were assayed for the 
presence of P elements in the generation preceding the ex- 
periment; all were confirmed to be P-element free, suggesting 
no contamination by feral, P-element harboring flies. Also, less 
than 0.1% of the observed polymorphic positions in the A2 
genome were predicted to be monomorphic from the foun- 
der consensus sequence data (i.e., site at which all founders 
share the same allele at that locus), and this percentage 
was similarly low across the four synthetic populations. 



This suggests that all four synthetic populations remain uncon- 
taminated by flies of different founder ancestry. 

Genes under Significant Peaks 

We find 270 genes total in the three 2-cM windows corre- 
sponding to the 5% FPR peaks in regions of normal recombi- 
nation (supplementary fig. S3a-c, Supplementary Material 
online). Among the half of these 270 genes that have been 
annotated with Gene Ontology (GO) terms based on experi- 
mental evidence, the most common GO Biological Process 
term is "defense response," shared by ten genes. This term 
appears to be modestly enriched in the data set (10/270 gen- 
es =3.7% vs. a genome-wide instance of 206/16,085 gen- 
es = 1 .2%). The second most common term among our 270 
genes is "glutathione metabolic process," but this is a special 
consequence of the nine genes of the glutathione transferase 
family that are clustered together on chromosome 2R. 
Although we do not think it appropriate to rely on GO term 
enrichment as a conclusive approach for the identification of 
candidate genes, we do find it useful in a general descriptive 
sense. Both defense response genes and the glutathione 
transferase family could have relevance for longevity. Most 
of the identified defense response genes have specific 
known immune functions following exposure to pathogenic 
bacteria. As insects age, they accumulate injuries (Burkhard 
et al. 2002) and are therefore more susceptible to opportu- 
nistic infection by pathogens. Variation in immune response 
genes could therefore causally affect longevity. Glutathione 
transferases inactivate damaging secondary metabolites, 
such as hydroperoxides, formed during oxidative stress 
(Hayes and Flanagan 2005). Free radical-scavenging enzymes 
such as superoxide dismutase (Sod) and catalase have long 
been implicated in aging (Harman 1956). Experimentally, 
these enzymes have been shown to increase lifespan 
when overexpressed in Drosophila (Orr and Sohal 1994), 
and laboratory-selected populations of flies with different 
evolved longevity phenotypes exhibit different frequencies of 
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allozymic variants with different levels of activity at such loci 
(Deckert-Cruz et al. 1997). Glutathione transferases fall into 
this category of antioxidant enzymes, and thus the genes that 
encode them could have consequences for lifespan. 

Although our gene list is ripe for the validation of candidate 
longevity genes, perhaps the list's most interesting feature is 
an absence of genes previously described in the literature. 
There are 130 D. melanogaster genes with a priori GO 
terms associated with aging (www.flybase.org), and only 
one of these, Atg7, occurs in our gene list. Atg7 is an autop- 
hagy gene known to decrease in expression with age, leading 
to an accumulation of damaged proteins in tissues (Demontis 
and Perrimon 2010). Notably, oft-cited "longevity" genes, 
such as mth (Lin et al. 1998) and indy (Rogina et al. 2000), 
shown to increase lifespan when mutated, are not associated 
with regions of haplotype or SNP divergence in control versus 
old flies. It is possible that naturally occurring variants in these 
genes contribute to longevity in a subtle manner and cannot 
be detected via our approach, or that natural variation in the 
expression of these genes is largely due to frans-regulatory 
genes. It is also possible that longevity-increasing mutations 
of large effect in these genes have pleiotropic consequences 
for early-life fitness components such as fecundity, preventing 
them from contributing to segregating variation in our test 
populations. 

Replication 

The three regions of high differentiation in regions of normal 
recombination all occur in population A2. Thus, none of our 
candidate genes are implicated in other synthetic populations, 
a pattern analogous to that seen in human centenarian stud- 
ies, where associations seem to be limited to single popula- 
tions. However, it is fair to note that when considering regions 
identified at lower levels of statistical stringency (FPR > 50%), 
we frequently observed regions of high differentiation within 
2-5 cM of each other in two different populations (table 1 and 
supplementary table S2, Supplementary Material online). This 
is the case for two of the three highly significant regions iden- 
tified in A2: one hit on chromosome 2R (supplementary 
fig. S3b, Supplementary Material online) is located 0.5 cM 
away from a less significant peak in the B2 population (sup- 
plementary fig. S4d, Supplementary Material online), and an- 
other hit on chromosome 2L (supplementary fig. S3c, 
Supplementary Material online) is located less than 3 cM 
away from a less significant peak in the A1 population (sup- 
plementary fig. S4e, Supplementary Material online). On one 
hand, this replication can be considered additional evidence 
for alleles affecting longevity in these genomic regions. On the 
other hand, the observation that regions are not implicated at 
the same high levels of significance in replicate populations 
suggests a limit to this study's ability to detect such alleles. The 
degree to which associations are limited to single populations 



due to truly independent causative variants versus a lack of 
power therefore remains unclear. 

To address the issue of exactly how differentiated our rep- 
licate populations are in general, we calculated F 5t values for 
pairwise comparisons between each of our four synthetic pop- 
ulations using our SNP data in a standard F st equation (Hartl 
and Clark 2007). We find the pairwise comparisons between 
the "pure" replicates from the same original cross to be smal- 
ler than the comparisons between populations from different 
ancestral crosses: A1/A2 = 0.09; B1/B2 = 0.07; average 
A/B = 0.14 (supplementary table S2, Supplementary Material 
online). Although this is expected, it is notable that the F st 
values between some human populations exceed those mea- 
sured in our pure replicates (Nelis et al. 2009) (supplementary 
table S3, Supplementary Material online). This provides some 
context for the general failure to replicate "extreme longev- 
ity" loci in human association studies. If we cannot implicate 
the same regions between Drosophila populations that have 
diverged for approximately 100 generations, it similarly may 
not be reasonable to expect human centenarian studies of 
populations with comparable degrees of shared ancestry to 
produce replicable results. 

The degree to which we expect to see replicable differences 
among populations remains unclear. There are two salient 
considerations: 1) replication is also rare among human 
genome-wide association study panels of sizes comparable 
to those used here; and 2) the populations used in this 
study are not true replicates. To address the first issue, in as- 
sociation studies where power is low (e.g., with sample sizes 
of <500 individuals), simulation studies have shown that rep- 
lication is often difficult to achieve, even when attempting to 
replicate associations in the same population (Long and 
Langley 1999). It is generally acknowledged that genome- 
wide associations typically only become replicable in panels 
of -2,000 cases versus controls (Wellcome Trust Case 
Control Consortium 2007), and of course this sample size is 
difficult to achieve in studies of the extremely old. 

The second issue to consider is the appropriateness of our 
populations as biological replicates. Although our population 
pairs do share ancestry, this is no guarantee that they are 
currently segregating the same causative alleles. To illustrate 
this idea, we queried a locus (chr3L:1 1 105723) at which a 
nonsynonymous SNP affects the activity of Sod, an enzyme 
with known effects on aging from both transgenic experi- 
ments (Orr and Sohal 1994) and studies of selectively bred 
populations (Deckert-Cruz et al. 1997). This position is poly- 
morphic in both founder populations, with a minor allele 
count of two of eight in the A founders and one of eight 
in the B founders. Despite this nonsynonymous SNP initially 
segregating in all four populations, we only detect this posi- 
tion as polymorphic in the single A2 synthetic population 
examined in our experiment after 100 generations of selec- 
tion and drift. That is, the minor allele of Sod has been sto- 
chastically lost in the other three replicate populations, not 
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an uncommon occurrence in the DSPR (cf. King, Merkes 
et al. 201 2, fig. 2). Although we do not observe this nonsyn- 
onymous SNP to be significantly differentiated in the A2 
cohort (1 5% in the control vs. 23% in the old pool), it nev- 
ertheless prompts the question of how many potentially 
functional alleles have been lost from one of our two popu- 
lation pairs. To visualize this, we went through the most 
significant peaks (from table 1) and looked at individual 
founder haplotype frequencies contributing to change be- 
tween the young and old pools (supplementary fig. S6a-h, 
Supplementary Material online). In most of these peak re- 
gions, it is very clear that certain haplotypes are estimated at 
near-zero frequencies in one of the control pools versus the 
control pool of the corresponding replicate population. For 
example, the peak centered on chr2R:1 1,845,275 is signifi- 
cant in A2 but not in A1. Supplementary figure S6a, 
Supplementary Material online, shows that founder haplo- 
type 5 occurs at intermediate frequency in the control A2 
pool and at a somewhat higher frequency in the A2 old pool, 
suggesting that this haplotype likely contributes to the sig- 
nificant D value in the A2 population. However, this haplo- 
type is virtually not present in the A1 control pool. As 
haplotype 5 has been lost (or is rare) in the A1 pool, it 
cannot increase as the pool ages, which is perhaps why we 
do not estimate a significant D value in this replicate popu- 
lation. This phenomenon appears to be occurring in the ma- 
jority of cases (supplementary fig. S6a-g, Supplementary 
Material online; for a possible counterexample, see supple- 
mentary fig. S6h, Supplementary Material online). 

SNP Differentiation at a Finer Scale Than 
Haplotype Differentiation 

Although regions of high SNP differentiation are more nu- 
merous than regions with high D values, they tend to span 
smaller physical distances along the chromosome. These 
peaks could thus potentially be informative, when used 
alongside D peaks, for localizing putative causative regions. 
To evaluate this, we looked at our three major regions of 
interest and evaluated them for evidence of smaller SWAD 
peaks contained within the haplotype peak that might fur- 
ther localize candidate regions. For two of these peaks (sup- 
plementary fig. S3a and c, Supplementary Material online), 
the region implicated by SNP frequency difference is nearly 
identical to that implicated by D. However, for the third peak 
(supplementary fig. S3£>, Supplementary Material online), the 
SNP frequency differences implicate a 1-cM region, rather 
than the entire 2-cM region spanned by the haplotype peak. 
To evaluate how this localization impacts our previous con- 
clusions vis-a-vis our candidate genes, we removed the 1 -cM 
region of relatively low SNP frequency differences from our 
analysis. This region resolves to 60 genes, and omitting these 
genes from our list of 270 does not change the most 
common GO terms observed. However, the single a priori 



longevity gene we observed, Atg7, is one of the 60 genes 
removed. This weakens the case for this gene being impor- 
tant in our study and perhaps lends more credence to the 
idea that so-called "aging" genes, which traditionally have 
been identified on the basis of mutant screens, are not reli- 
able candidates for studies of longevity in outbred flies. 

Conclusions 

So what is the relevance of this work for human association 
studies of centenarians? The number of our cases, -120 ex- 
tremely old individuals per population, is not as large as what 
we see in the best human studies. That being said, the 
strengths of this study are 1) four cohorts handled identically, 
2) a controlled environment in which to measure the pheno- 
type, and 3) appropriate controls collected from within the 
same cohort. These experimental design features are unachie- 
vable in human studies. In addition, the synthetic population 
resource used here provides haplotype information that al- 
lowed us to identify putative candidate regions that SNP- 
level analysis alone may not have identified. In our data set, 
the SNP-level analyses appeared to result in many more dif- 
ferentiated regions than the haplotype-level analysis. Previous 
studies contrasting haplotype-level analyses with marker-level 
analyses have shown that marker-level analyses can lead to 
spurious results through simple random sampling (Aylor et al. 
201 1) and that marker-level analyses are prone to larger map- 
ping location errors (Valdar et al. 2005). Thus, there is value in 
applying the GWAS approach to populations derived from a 
limited number of founders and mapping effects back to 
founder haplotypes as opposed to SNPs. By focusing on a 
limited pool of founder haplotypes, it is apparent that different 
haplotypes have different effects that SNPs alone cannot enu- 
merate. This observation may explain some of the population 
heterogeneity seen in human longevity GWAS panels. 

Our observation that five out of eight regions with signifi- 
cant effects on longevity are in regions of suppressed recom- 
bination, which are much more likely to harbor unconditionally 
deleterious alleles of large effect with elevated minor allele 
frequencies than regions of normal recombination, lends sup- 
port to the mutation accumulation hypothesis for variation in 
longevity. This observation furthermore suggests that telo- 
meric and centromeric regions may be fruitful places to look 
for genes that impact longevity in humans. Finally, we associ- 
ate bacterial defense and glutathione transferase genes with 
extreme longevity, suggesting that standing variation impact- 
ing longevity in outbred populations may have a different 
genetic basis than genes identified via forward screens for 
mutants of large effect, perhaps due to early-life trade-offs. 

Supplementary Material 

Supplementary figures S1-S6 and tables S1-S3 are available 
at Genome Biology and Evolution online (http://www.gbe. 
oxfordjournals.org/). 
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